Introduction to seaborn in python

Seaborn in python is a library for statistical plotting , built upon the matplot library ( which means matplot is the base to learn seaborn ) . Seaborn has more features and styles compared to the matplot library. This library is designed to work very well with pandas dataframe object. Seaborn library is already inbuilt in jupyter notebook. Let us import seaborn library into the notebook.

Topics covered in last tutorial:

  • Matplot library
  • Plotting using matplot lib
  • Functional method of plotting
  • Object oriented method of plotting

Seaborn in python:

Importing only the library will not allow us to visualize plots in jupyter notebook. To access plots in jupyter notebook we have to pass the same thing we have passed while working with matplot library.

Here we will be working with a dataset named ‘tips’ which shows the information of customers who gave a tip to the waiters in the restaurant.

Let us learn how to import the dataset using seaborn.

In these seaborn tutorials we will be working with different type of graphical representations such as

  • Histograms
  • Heatmaps
  • Linear model plots ( will be learning about this deeply in machine learning )
  • Box plots
  • Violin plots
  • Scatter plots
  • Hexa plots
  • Matrix plots
  • And much more

Let us basically plot a histogram of numerical data [ ‘tip’ ] from the data set we imported.

Syntax: sns.distplot( dataset_name[ ‘numeric_col_label’]

On plotting this we also get a line known as KDE ( kernel density estimation ) . To remove this line we have to pass another parameter in the distplot function.

Parameters of distplot ( ):

ParameterIt’s use
histDefault it is True which means visibility of histogram.
kdeIt is set True default, which means plot kde. Set it to False which will remove kde plot.
binsSpecifications of hist bins. If unspecified , a reference rule is used to find a useful default.
rugThis is a rug plot, default is rug= False.A rug plot is a dash marking for every single point along distribution line.
colorThis is the parameter to change the color of the plot.
verticalThis parameter default is False, If it is given true, plot will be on y-axis instead of x-axis
norm_histThis parameter is False in default, if it is passed the graph will give density instead of count.
axlabelThis parameter takes a string and places at the bottom of the plot.
labelThis parameter takes a string and places that relevant to the plot while using legend function. 

Now let us plot varying all these parameters.

  1. Let us set
      hist = False
  2. We already have seen kde as False in above plots
  3. Let us set
    bins = 100
  4. Now we shall introduce rug plot in this plot.
  5. To change the color of the plot, we have to pass
    sns.distplot( a[ ‘tip’ ], color=’red’ )
  6. Basically this plot is on the x-axis , to plot on y-axis pass
    sns.displot ( a[‘ tip’ ], kde = False , vertical = True )
  7. As clearly observed the scale of the plots, that is the count of the tip( from the data set information ) , to get the density instead of count, we have to pass
    sns.distplot ( a[ ‘tip’ ], kde = False , norm_hist = True )

    These were the few important parameters of the distplot ( ). 
Spread knowledge

Leave a Comment

Your email address will not be published. Required fields are marked *